Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

154

Applications in Computer Vision

where f(·) is the nearest-neighbor interpolation. Therefore, we formulate the learning ob-

jective for feature reﬁnement as

arg min

aL,a^∗

max

WD ^L^F

Adv⁽^a^L^,^a^∗

H^{, W}^D^{) +}^L^F

MSE⁽^a^L^,^a^∗

H⁾^∀ⁱ^∈^N,

(6.13)

where L ^K

Adv⁽^wⁱ^,^b^wⁱ^{, α}ⁱ^{, W}^D^{) is the adversarial loss as}

L ^F

Adv⁽^a^L^,^a^∗

H^{, W}^D^{) =}^log⁽^D⁽^a^∗

H^;^W^D^{)) +}^log⁽¹⁻^D⁽^a^L^;^W^D⁾⁾^,

(6.14)

where D(·) consists of several basic blocks, each with a fully connected layer and a

LeakyReLU layer. In addition, we adopt several discriminators to reﬁne the features during

the binarization training process.

Moreover, L ^F

MSE⁽^wⁱ^,^b^wⁱ^{, α}ⁱ^{) is the feature loss between the low-level and high-level}

features, which is expressed by MSE as

L ^F

MSE⁽^a^L^,^a^∗

H^{) =}^μ

2 ^||^a^L⁻^a^∗

H^||²

2^,

(6.15)

where μ is a balancing hyperparameter.

6.2.4

Optimization

For a speciﬁc task, the conventional problem-dependent loss LS e.g., the cross entropy, is

considered, thus the learning objective is deﬁned as

arg

min

wi,αi,pi ⁼^L^S⁽^wⁱ^{, α}ⁱ^,^pⁱ⁾^∀ⁱ^∈^N,

(6.16)

where pi denotes the other parameters of BNN, e.g, parameters of BN and PReLU. There-

fore, the general learning objective of BiRe-ID is Eqs. 6.79, 6.13, and 6.16. For each convo-

lutional layer, we sequentially update wi, αi and pi.

Updating wi: Consider δwi as the gradient of the real-valued kernels wi. Thus,

δwi = ^∂^L

∂wi

= ^∂^L^S

∂wi

+ ^∂^L^K

Adv

∂wi

+ ^∂^L^F

Adv

∂wi

+ ^∂^L^K

MSE

∂wi

+ ^∂^L^F

MSE

∂wi

(6.17)

During the backpropagation of softmax loss LS(wi, αi, pi), the gradients go to b^wⁱﬁrst

and then to wi. Thus, we formulate is as

∂LS

∂wi

= ^∂^L^S

∂b^wⁱ

∂wi

(6.18)

where

∂b^wⁱ

∂wi

⎧

⎨

⎩

1.2 + 2wi,

−1 ≤wi < 0,

2 −2wi,

0 ≤wi < 1,

10,

otherwise,

(6.19)

which is an approximation of the 2×dirac-delta function [159]. Furthermore,

∂L ^K

Adv

∂wi

D(wi; WD)

∂D

∂wi

(6.20)

∂L ^K

MSE

∂wi

= λ(wi −αi ◦b^wⁱ) ◦αi,

(6.21)

∂L ^F

Adv

∂wi

= −

1 −D(ai; WD)

∂D

∂ai

∂wi

I(i ∈L),

(6.22)